Monocular Fisheye Camera Depth Estimation Using Semi-supervised Sparse Velodyne Data

نویسندگان

  • Varun Ravi Kumar
  • Stefan Milz
  • Martin Simon
  • Christian Witt
  • Karl Amende
  • Johannes Petzold
  • Senthil Yogamani
چکیده

Near-field depth estimation around a self-driving car is an important function that can be achieved by four wide-angle fisheye cameras having a field of view of over 180◦. CNN based depth estimation produce state of the art results, but progress is hindered because depth annotation cannot be obtained manually. Synthetic datasets are commonly used but they have limitations. For instance, they do not capture the extensive variability in the appearance of objects like vehicles present in real datasets. There is also a domain shift while performing inference on natural images illustrated by many attempts to handle the domain adaptation explicitly. In this work, we explore an alternate approach of training using sparse LIDAR data as ground truth for depth estimation for fisheye camera. We built our own dataset using our self-driving car setup which has a 64-beam Velodyne LIDAR and four wide angle fisheye cameras. LIDAR data projected onto the image plane is sparse and hence viewed as semi-supervision for dense depth estimation. To handle the difference in view-points of LIDAR and fisheye camera, an occlusion resolution mechanism was implemented. We started with Eigen’s multiscale convolutional network architecture [1] and improved by modifying activation function and optimizer. We obtained promising results on our dataset with RMSE errors better than the state-of-the-art results obtained on KITTI because of vast amounts of training data. Our model runs at 20 fps on an embedded platform Nvidia TX2.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Camera Pose Estimation in Unknown Environments using a Sequence of Wide-Baseline Monocular Images

In this paper, a feature-based technique for the camera pose estimation in a sequence of wide-baseline images has been proposed. Camera pose estimation is an important issue in many computer vision and robotics applications, such as, augmented reality and visual SLAM. The proposed method can track captured images taken by hand-held camera in room-sized workspaces with maximum scene depth of 3-4...

متن کامل

Driven to Distraction: Self-Supervised Distractor Learning for Robust Monocular Visual Odometry in Urban Environments

We present a self-supervised approach to ignoring “distractors” in camera images for the purposes of robustly estimating vehicle motion in cluttered urban environments. We leverage offline multi-session mapping approaches to automatically generate a per-pixel ephemerality mask and depth map for each input image, which we use to train a deep convolutional network. At run-time we use the predicte...

متن کامل

On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach

We revisit the problem of visual depth estimation in the context of autonomous vehicles. Despite the progress on monocular depth estimation in recent years, we show that the gap between monocular and stereo depth accuracy remains large—a particularly relevant result due to the prevalent reliance upon monocular cameras by vehicles that are expected to be self-driving. We argue that the challenge...

متن کامل

Vision-Based humanoid Navigation using Self-Supervised Obstacle Detection

In this article, we present an efficient approach to obstacle detection for humanoid robots based on monocular images and sparse laser data. We particularly consider collision-free navigation with the Nao humanoid, which is the most popular small-size robot nowadays. Our approach first analyzes the scene around the robot by acquiring data from a laser range finder installed in the head. Then, i...

متن کامل

Learning Deeply Supervised Visual Descriptors for Dense Monocular Reconstruction

Visual SLAM (Simultaneous Localization and Mapping) methods typically rely on handcrafted visual features or raw RGB values for establishing correspondences between images. These features, while suitable for sparse mapping, often lead to ambiguous matches at texture-less regions when performing dense reconstruction due to the aperture problem. In this work, we explore the use of learned feature...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018